PROYECTO DE MACHINE LEARNING¶

  • Sebastián Carrero Cardona¶

Objetivo¶

Identificar el modelo de clasificación de machine learning más eficiente de tres que se implementaran a la base de datos pública "bank_marketing"

Contexto¶

Utilizaremos este conjunto de datos para probar el rendimiento de un modelo de clasificación y explorar las mejores estrategias para mejorar la próxima campaña de marketing directo de una institución bancaria.

Los depósitos a plazo son inversiones en efectivo que se mantienen en una entidad financiera y constituyen una importante fuente de ingresos para los bancos, por lo que son importantes para las entidades financieras a la hora de comercializarlos. El telemarketing sigue siendo una técnica de marketing muy popular debido a la eficacia potencial del contacto de persona a persona que proporciona una llamada telefónica, que a veces es todo lo contrario de muchos mensajes de marketing impersonales y robóticos transmitidos a través de los medios sociales y digitales. Sin embargo, la ejecución de este esfuerzo de marketing directo suele requerir una gran inversión por parte de la empresa, ya que es necesario contratar grandes centros de llamadas para contactar directamente con los clientes.

¿Cómo puede la entidad bancaria realizar campañas de marketing directo más eficaces en el futuro? Analice este conjunto de datos e identifique los patrones que nos ayudarán a desarrollar estrategias futuras.

Importación de librerías¶

In [ ]:
import numpy as np
import pandas as pd
import matplotlib as plt
pd.pandas.set_option('display.max_columns', None)
from ydata_profiling import ProfileReport
from sklearn.feature_selection import SelectFromModel
import seaborn as sns # Visualización de alto nivel
from sklearn.preprocessing import OneHotEncoder
from pycaret.classification import *
from imblearn.under_sampling import RandomUnderSampler

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
import matplotlib.patches as mpatches
import matplotlib.ticker as mtick
class Colors:
    Gray = "#5d5d5d"
    LightGray = "#fafafa"
    Black = "#000000"
    White = "#FFFFFF"
    Teal = "#008080"
    Aquamarine = "#76c8c8"
    Blue = "#2596be"
    LightCyan = "#badbdb"
    WhiteSmoke = "#dedad2"
    Cream = "#e4bcad"
    PeachPuff = "#df979e"
    HotPink = "#d7658b"
    DeepPink = "#c80064"
    LightSeaGreen = "#20B2AA"
    DarkGray = "#464144"

Carga de dataframe¶

In [ ]:
data = pd.read_csv('../data/raw/bank_marketing.csv', sep=';')
data
Out[ ]:
class age job marital education default housing loan contact month day_of_week duration campaign pdays previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 no 30.0 blue-collar married basic.9y no yes no cellular may fri 487.0 2.0 NaN 0.0 nonexistent -1.8 92.893 -46.2 1.313 5099.1
1 no 39.0 services single high.school no no no telephone may fri 346.0 4.0 NaN 0.0 nonexistent 1.1 93.994 -36.4 4.855 5191.0
2 no 25.0 services married high.school no yes no telephone jun wed 227.0 1.0 NaN 0.0 nonexistent 1.4 94.465 -41.8 4.962 5228.1
3 no 38.0 services married basic.9y no unknown unknown telephone jun fri 17.0 3.0 NaN 0.0 nonexistent 1.4 94.465 -41.8 4.959 5228.1
4 no 47.0 admin. married university.degree no yes no cellular nov mon 58.0 1.0 NaN 0.0 nonexistent -0.1 93.200 -42.0 4.191 5195.8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 no 30.0 admin. married basic.6y no yes yes cellular jul thu 53.0 1.0 NaN 0.0 nonexistent 1.4 93.918 -42.7 4.958 5228.1
4115 no 39.0 admin. married high.school no yes no telephone jul fri 219.0 1.0 NaN 0.0 nonexistent 1.4 93.918 -42.7 4.959 5228.1
4116 no 27.0 student single high.school no no no cellular may mon 64.0 2.0 NaN 1.0 failure -1.8 92.893 -46.2 1.354 5099.1
4117 no 58.0 admin. married high.school no no no cellular aug fri 528.0 1.0 NaN 0.0 nonexistent 1.4 93.444 -36.1 4.966 5228.1
4118 no 34.0 management single high.school no yes no cellular nov wed 175.0 1.0 NaN 0.0 nonexistent -0.1 93.200 -42.0 4.120 5195.8

4119 rows × 21 columns

In [ ]:
data.shape
Out[ ]:
(4119, 21)

Descripcion General del Dataset¶

  • numero de filas y columnas
  • tipos de datos y si estan correctos
In [ ]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4119 entries, 0 to 4118
Data columns (total 21 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   class           4119 non-null   object 
 1   age             4119 non-null   float64
 2   job             4119 non-null   object 
 3   marital         4119 non-null   object 
 4   education       4119 non-null   object 
 5   default         4119 non-null   object 
 6   housing         4119 non-null   object 
 7   loan            4119 non-null   object 
 8   contact         4119 non-null   object 
 9   month           4119 non-null   object 
 10  day_of_week     4119 non-null   object 
 11  duration        4119 non-null   float64
 12  campaign        4119 non-null   float64
 13  pdays           160 non-null    float64
 14  previous        4119 non-null   float64
 15  poutcome        4119 non-null   object 
 16  emp.var.rate    4119 non-null   float64
 17  cons.price.idx  4119 non-null   float64
 18  cons.conf.idx   4119 non-null   float64
 19  euribor3m       4119 non-null   float64
 20  nr.employed     4119 non-null   float64
dtypes: float64(10), object(11)
memory usage: 675.9+ KB

Additional Information¶

Input variables:¶

bank client data:
    1. age (numeric)
    1. job : tipo de trabajo (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
    1. marital : estado civil (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
    1. education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
    1. default: tiene crédito en mora? (categorical: 'no','yes','unknown')
    1. housing: tiene préstamo para vivienda? (categorical: 'no','yes','unknown')
    1. loan: tiene prestamo personal? (categorical: 'no','yes','unknown')
related with the last contact of the current campaign:¶
    1. contact: tipo de contacto (categorical: 'cellular','telephone')
    1. month: ultimo contacto del mes (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
    1. day_of_week: ultimo ía de contacto de la semana (categorical: 'mon','tue','wed','thu','fri')
    1. duration: duración último contactoen segundos (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

other attributes:¶
    1. campaign: número de contactos realizados durante esta campaña y para este cliente (numeric, includes last contact)
    1. pdays: número de días transcurridos desde la última vez que se contactó con el cliente en una campaña anterior (numeric; 999 means client was not previously contacted)
    1. previous: número de contactos realizados antes de esta campaña y para este cliente (numeric)
    1. poutcome: resultado de la campaña de marketing anterior (categorical: 'failure','nonexistent','success')
social and economic context attributes¶
    1. emp.var.rate: tasa de variación del empleo - quarterly indicator (numeric)
    1. cons.price.idx: indice de precio al consumidor - monthly indicator (numeric)
    1. cons.conf.idx: índice de confianza del consumidor - monthly indicator (numeric)
    1. euribor3m: tasa de interes promedio a 3 meses - daily indicator (numeric)
    1. nr.employed: número de empleados - quarterly indicator (numeric)
Output variable (desired target):¶
    1. y - ¿el cliente ha suscrito un depósito a plazo? (binary: 'yes','no')
In [ ]:
data["class"].value_counts() #Vericamos el blanceo de la variable objetivo
Out[ ]:
no     3668
yes     451
Name: class, dtype: int64

Limpieza de calidad de datos general¶

Datos duplicados exactos¶

In [ ]:
data.duplicated().sum()
Out[ ]:
0

Se verifica que no hay datos duplicados exactos.

Datos nulos¶

In [ ]:
data.isnull().sum()#datos nulos de cada variable
Out[ ]:
class                0
age                  0
job                  0
marital              0
education            0
default              0
housing              0
loan                 0
contact              0
month                0
day_of_week          0
duration             0
campaign             0
pdays             3959
previous             0
poutcome             0
emp.var.rate         0
cons.price.idx       0
cons.conf.idx        0
euribor3m            0
nr.employed          0
dtype: int64
In [ ]:
data = data.drop('pdays', axis=1)
data
Out[ ]:
class age job marital education default housing loan contact month day_of_week duration campaign previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 no 30.0 blue-collar married basic.9y no yes no cellular may fri 487.0 2.0 0.0 nonexistent -1.8 92.893 -46.2 1.313 5099.1
1 no 39.0 services single high.school no no no telephone may fri 346.0 4.0 0.0 nonexistent 1.1 93.994 -36.4 4.855 5191.0
2 no 25.0 services married high.school no yes no telephone jun wed 227.0 1.0 0.0 nonexistent 1.4 94.465 -41.8 4.962 5228.1
3 no 38.0 services married basic.9y no unknown unknown telephone jun fri 17.0 3.0 0.0 nonexistent 1.4 94.465 -41.8 4.959 5228.1
4 no 47.0 admin. married university.degree no yes no cellular nov mon 58.0 1.0 0.0 nonexistent -0.1 93.200 -42.0 4.191 5195.8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 no 30.0 admin. married basic.6y no yes yes cellular jul thu 53.0 1.0 0.0 nonexistent 1.4 93.918 -42.7 4.958 5228.1
4115 no 39.0 admin. married high.school no yes no telephone jul fri 219.0 1.0 0.0 nonexistent 1.4 93.918 -42.7 4.959 5228.1
4116 no 27.0 student single high.school no no no cellular may mon 64.0 2.0 1.0 failure -1.8 92.893 -46.2 1.354 5099.1
4117 no 58.0 admin. married high.school no no no cellular aug fri 528.0 1.0 0.0 nonexistent 1.4 93.444 -36.1 4.966 5228.1
4118 no 34.0 management single high.school no yes no cellular nov wed 175.0 1.0 0.0 nonexistent -0.1 93.200 -42.0 4.120 5195.8

4119 rows × 20 columns

In [ ]:
data.isnull().sum()
Out[ ]:
class             0
age               0
job               0
marital           0
education         0
default           0
housing           0
loan              0
contact           0
month             0
day_of_week       0
duration          0
campaign          0
previous          0
poutcome          0
emp.var.rate      0
cons.price.idx    0
cons.conf.idx     0
euribor3m         0
nr.employed       0
dtype: int64

Cambio de tipos de datos¶

In [ ]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4119 entries, 0 to 4118
Data columns (total 20 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   class           4119 non-null   object 
 1   age             4119 non-null   float64
 2   job             4119 non-null   object 
 3   marital         4119 non-null   object 
 4   education       4119 non-null   object 
 5   default         4119 non-null   object 
 6   housing         4119 non-null   object 
 7   loan            4119 non-null   object 
 8   contact         4119 non-null   object 
 9   month           4119 non-null   object 
 10  day_of_week     4119 non-null   object 
 11  duration        4119 non-null   float64
 12  campaign        4119 non-null   float64
 13  previous        4119 non-null   float64
 14  poutcome        4119 non-null   object 
 15  emp.var.rate    4119 non-null   float64
 16  cons.price.idx  4119 non-null   float64
 17  cons.conf.idx   4119 non-null   float64
 18  euribor3m       4119 non-null   float64
 19  nr.employed     4119 non-null   float64
dtypes: float64(9), object(11)
memory usage: 643.7+ KB
In [ ]:
data
Out[ ]:
class age job marital education default housing loan contact month day_of_week duration campaign previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 no 30.0 blue-collar married basic.9y no yes no cellular may fri 487.0 2.0 0.0 nonexistent -1.8 92.893 -46.2 1.313 5099.1
1 no 39.0 services single high.school no no no telephone may fri 346.0 4.0 0.0 nonexistent 1.1 93.994 -36.4 4.855 5191.0
2 no 25.0 services married high.school no yes no telephone jun wed 227.0 1.0 0.0 nonexistent 1.4 94.465 -41.8 4.962 5228.1
3 no 38.0 services married basic.9y no unknown unknown telephone jun fri 17.0 3.0 0.0 nonexistent 1.4 94.465 -41.8 4.959 5228.1
4 no 47.0 admin. married university.degree no yes no cellular nov mon 58.0 1.0 0.0 nonexistent -0.1 93.200 -42.0 4.191 5195.8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 no 30.0 admin. married basic.6y no yes yes cellular jul thu 53.0 1.0 0.0 nonexistent 1.4 93.918 -42.7 4.958 5228.1
4115 no 39.0 admin. married high.school no yes no telephone jul fri 219.0 1.0 0.0 nonexistent 1.4 93.918 -42.7 4.959 5228.1
4116 no 27.0 student single high.school no no no cellular may mon 64.0 2.0 1.0 failure -1.8 92.893 -46.2 1.354 5099.1
4117 no 58.0 admin. married high.school no no no cellular aug fri 528.0 1.0 0.0 nonexistent 1.4 93.444 -36.1 4.966 5228.1
4118 no 34.0 management single high.school no yes no cellular nov wed 175.0 1.0 0.0 nonexistent -0.1 93.200 -42.0 4.120 5195.8

4119 rows × 20 columns

In [ ]:
data[['age', 'duration', 'campaign', 'previous', 'nr.employed']] = data[['age', 'duration', 'campaign', 'previous', 'nr.employed']].astype('int64')
In [ ]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4119 entries, 0 to 4118
Data columns (total 20 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   class           4119 non-null   object 
 1   age             4119 non-null   int64  
 2   job             4119 non-null   object 
 3   marital         4119 non-null   object 
 4   education       4119 non-null   object 
 5   default         4119 non-null   object 
 6   housing         4119 non-null   object 
 7   loan            4119 non-null   object 
 8   contact         4119 non-null   object 
 9   month           4119 non-null   object 
 10  day_of_week     4119 non-null   object 
 11  duration        4119 non-null   int64  
 12  campaign        4119 non-null   int64  
 13  previous        4119 non-null   int64  
 14  poutcome        4119 non-null   object 
 15  emp.var.rate    4119 non-null   float64
 16  cons.price.idx  4119 non-null   float64
 17  cons.conf.idx   4119 non-null   float64
 18  euribor3m       4119 non-null   float64
 19  nr.employed     4119 non-null   int64  
dtypes: float64(4), int64(5), object(11)
memory usage: 643.7+ KB
In [ ]:
for col in data:
    print(f"{col} = {data[col].unique()}")
class = ['no' 'yes']
age = [30 39 25 38 47 32 41 31 35 36 29 27 44 46 45 50 55 40 28 34 33 51 48 20
 76 56 24 58 60 37 52 42 49 54 59 57 43 53 75 82 71 21 22 23 26 81 61 67
 73 18 64 74 77 86 85 63 88 78 72 68 80 66 19 62 65 69 70]
job = ['blue-collar' 'services' 'admin.' 'entrepreneur' 'self-employed'
 'technician' 'management' 'student' 'retired' 'housemaid' 'unemployed'
 'unknown']
marital = ['married' 'single' 'divorced' 'unknown']
education = ['basic.9y' 'high.school' 'university.degree' 'professional.course'
 'basic.6y' 'basic.4y' 'unknown' 'illiterate']
default = ['no' 'unknown' 'yes']
housing = ['yes' 'no' 'unknown']
loan = ['no' 'unknown' 'yes']
contact = ['cellular' 'telephone']
month = ['may' 'jun' 'nov' 'sep' 'jul' 'aug' 'mar' 'oct' 'apr' 'dec']
day_of_week = ['fri' 'wed' 'mon' 'thu' 'tue']
duration = [ 487  346  227   17   58  128  290   44   68  170  301  148   97  211
  553  698  191   59   38  849  326  222  626  119  388  479  446  127
  109  113  393  151  256   42  525   57  499   84  137   31  430  126
  340  412  132   79  341  157  252  263  215   89  143   40   10  481
  233  204  403  180   16  447   81  361 1091  395  432  596   77  768
   96  357  459   11  264   93  374  158   95  835  505  300  390  274
  135  257  268  477   91   76  103  436  483  250  259  389    7  123
   92  297  406  104  854  147  203  149  144  394  523   73  197  108
   80  114  122 1161  181  239  360  314  984  663  141  706  797  311
   63  111   49  171  242  279  246  309  168  153  152   90  117  640
  199 1114   74  190  738  224  344  383   35  772  124  345  951  188
  809  192  154  100  317  293   30  442  187   64  629  423  888  207
  265  273   85  261  136  711   88   72  307   39  156  202  353  159
  347  174  280  686   94  225  474  377  185  121  160  313  219  267
  228  355  102  116   83  473  605  585  255 1868  846  404   51   87
  167  440  673   48  236  288  193  318  209  173  503  101  370 1207
  262  609  806  335  266  434   82   15  155  339  206  178  461   50
   56   55  142    9  247  130  336  424  617  238  632   86  165  212
   54  184    6   70   98  106  456  118  241  439  322  417  498  405
   99  712  112  223  133  258  958  898  282  175  235  372   69  183
  270  134  449  115  205  145  548  379  105  544  401  549  291  655
  179  391  750  454   23  363  775  164  988  471  385  125  886   34
  334  955  545  659  230  699 1276  251   25  696  701  342  161  275
  172  139  232  131   36  600  177  217  216  329  604  634  107  245
  690  286  201  198  249  226 1058  299  441  285  195  292  298 1013
  248 1319  146  294  575  237  861  618  271  200  166  367  218  584
  509   27   78  162  651  415 1149  110  240  366  284  431  608  244
  455  807  420  182  638  641   21 1348  324  331  550  489  304  189
  728  278  387   29   71  767 1476  176   52  150   32   12  501  381
  482   14  569  697  581  243  229  408   53  305  316  577  427  214
   19   65  281  468   67  438  582  721  295  231  221 1170  368 1360
  433  352   37  650  289  213   22   43   26  532   75  557  541   62
    5  941  422  319  653  397 1447  999  321 1143  667 1132   60  396
  194 1068  337  400  140  409  208   13  458  713  820  310  587  320
  566  748  599  411 1185  398  169  272   66  679    8   18  497 1065
  276  716   20  760  253  551  675   46  484  333  369  464  362  997
  287  649  470  762  591  758 1551  480  869   61  129  979  630  234
  354  502  451  296  407  120  754  589   41  514  919  530  595  526
  494   24 1353  332 1234  687  428  488  486  413  892  452  614  749
 1327   28   47  677  643 2653  302  570  938  260  901  138  590  546
  371  312  163  328  722  323  611  539  359  671  781 1005  303  343
  418   45  419 1148  349 3253  606  894  813  891  210 1067  543  382
  492 1183  903    4  375 1628  840 1167  386  868  327  485  506  351
  315  529 1720  533  429  766  616 1130  747  496 2301  460  220  776
  568  448  186  534 1334 1138 1019  364 1090  857  269  637  536  475
  453  330  338  764  873 1176  384   33  602  476    0  689  718  796
  662  799  715  633  348 1014  700 1045 1152  725  358  196  493  254
  742  504 1092  399  952  426  457 3643 1105  838  829  565  644  771
  513  646  356  693  592  628  556  769 1111  843  668  848  855  517
  992  619  867 1441  665 1171  542  607  800 1150 1855 1203  723  308
  823 1076  837  780  789 1002  578  507  508  567  421 1241  373  571
  469  527  588  645 1221  704  378 1127  818 1062  562  825  435  802
  531  306  739  365  325 1432 1806 1046  674  740 1119  636 1357  414
  727 1009  283 1011  511 1186  402  519  490  683  688 1340  472  882
  520  515 1332 1820 1311  559 1365 1980  410  895 1190  784  376  521
  834  450 1128  516  770 1074 1259 1422 1300 1135  624  540  657  627
  681  491  705  597 1298 1438  277 1087  782  416 1288 1424  720  726
  537  996  815  805 1468  801  495  463  814  350  702  623  980 1195
  478  881  445  658  528  522 1012 1590  621 1602  757  593  879  580
  620 1386]
campaign = [ 2  4  1  3  6  7 27  5 12 14 10  8 11 13  9 15 16 18 17 22 19 23 24 35
 29]
previous = [0 2 1 3 5 4 6]
poutcome = ['nonexistent' 'failure' 'success']
emp.var.rate = [-1.8  1.1  1.4 -0.1 -1.1 -2.9 -1.7 -3.4 -3.  -0.2]
cons.price.idx = [92.893 93.994 94.465 93.2   94.199 93.918 93.444 93.369 92.843 92.963
 94.601 94.027 92.379 92.431 93.749 93.075 94.055 92.469 94.767 92.201
 92.649 94.215 93.876 93.798 92.713 92.756]
cons.conf.idx = [-46.2 -36.4 -41.8 -42.  -37.5 -42.7 -36.1 -34.8 -50.  -40.8 -49.5 -38.3
 -29.8 -26.9 -34.6 -47.1 -39.8 -33.6 -50.8 -31.4 -30.1 -40.3 -40.  -40.4
 -33.  -45.9]
euribor3m = [1.313 4.855 4.962 4.959 4.191 0.884 0.879 4.153 4.958 4.968 4.859 4.963
 4.957 4.965 4.961 0.639 4.967 4.864 4.856 1.299 4.86  1.687 4.865 1.268
 4.12  1.334 0.977 1.344 0.899 1.327 4.592 4.97  1.26  4.966 0.77  4.866
 4.964 4.857 0.886 0.739 0.654 1.405 1.281 4.96  0.754 1.291 1.365 4.076
 1.266 1.41  1.25  4.858 0.702 1.029 1.085 1.392 1.262 1.05  0.851 0.716
 0.877 0.835 1.048 0.904 1.028 0.637 1.244 1.354 4.021 1.453 0.715 1.778
 0.773 1.035 0.9   0.898 0.742 0.861 1.264 0.704 1.27  0.695 1.039 1.531
 0.883 0.748 0.809 4.794 1.479 0.697 0.959 1.032 0.896 0.827 1.483 0.905
 1.466 0.714 0.644 0.849 0.881 0.834 0.645 0.659 0.885 1.041 0.942 0.737
 4.947 0.722 1.049 1.415 0.797 0.699 0.81  0.71  1.423 0.707 0.646 1.043
 4.955 0.668 0.825 1.435 0.72  0.767 0.982 1.602 1.259 1.811 0.859 1.224
 0.876 0.878 1.099 0.788 0.717 0.838 0.64  0.762 1.663 0.73  0.728 1.372
 0.782 4.245 1.51  3.329 0.749 4.343 0.893 0.731 0.635 0.7   0.889 0.649
 0.873 1.445 1.629 0.944 3.853 0.87  0.79  5.045 0.914 0.719 0.735 1.498
 0.677 0.819 0.652 0.692 0.829 1.726 1.406 0.761 0.846 1.252 4.956 0.953
 0.803 0.937 0.706 0.869 1.703 0.729 0.709 1.046 0.752 0.921 4.921 0.987
 1.03  1.031 0.741 0.843 1.044 0.643 0.755 0.724 0.882 1.757 1.215 0.74
 0.683 1.52  4.663 1.059 0.636 0.771 0.655 1.4   0.65  1.384 0.778 0.682
 1.614 1.04  1.538 1.072 1.    1.799 1.64  1.65  0.642 0.718 0.768 0.723
 0.996 0.721 0.672 0.854 1.016 0.965]
nr.employed = [5099 5191 5228 5195 4963 5008 5076 4991 5017 5023 5176]

Analisis exploratorio de datos¶

In [ ]:
profile = ProfileReport(data, title="Bank Marketing Profiling Report")
profile
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[ ]:

In [ ]:
data = data.drop('previous', axis=1)
data
Out[ ]:
class age job marital education default housing loan contact month day_of_week duration campaign poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 no 30 blue-collar married basic.9y no yes no cellular may fri 487 2 nonexistent -1.8 92.893 -46.2 1.313 5099
1 no 39 services single high.school no no no telephone may fri 346 4 nonexistent 1.1 93.994 -36.4 4.855 5191
2 no 25 services married high.school no yes no telephone jun wed 227 1 nonexistent 1.4 94.465 -41.8 4.962 5228
3 no 38 services married basic.9y no unknown unknown telephone jun fri 17 3 nonexistent 1.4 94.465 -41.8 4.959 5228
4 no 47 admin. married university.degree no yes no cellular nov mon 58 1 nonexistent -0.1 93.200 -42.0 4.191 5195
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 no 30 admin. married basic.6y no yes yes cellular jul thu 53 1 nonexistent 1.4 93.918 -42.7 4.958 5228
4115 no 39 admin. married high.school no yes no telephone jul fri 219 1 nonexistent 1.4 93.918 -42.7 4.959 5228
4116 no 27 student single high.school no no no cellular may mon 64 2 failure -1.8 92.893 -46.2 1.354 5099
4117 no 58 admin. married high.school no no no cellular aug fri 528 1 nonexistent 1.4 93.444 -36.1 4.966 5228
4118 no 34 management single high.school no yes no cellular nov wed 175 1 nonexistent -0.1 93.200 -42.0 4.120 5195

4119 rows × 19 columns

Conversión de categorico a númerico¶

In [ ]:
#Conversión de la varibale objetivo
data['class'] = data['class'].map({'yes': 1, 'no': 0})
data
Out[ ]:
class age job marital education default housing loan contact month day_of_week duration campaign poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 0 30 blue-collar married basic.9y no yes no cellular may fri 487 2 nonexistent -1.8 92.893 -46.2 1.313 5099
1 0 39 services single high.school no no no telephone may fri 346 4 nonexistent 1.1 93.994 -36.4 4.855 5191
2 0 25 services married high.school no yes no telephone jun wed 227 1 nonexistent 1.4 94.465 -41.8 4.962 5228
3 0 38 services married basic.9y no unknown unknown telephone jun fri 17 3 nonexistent 1.4 94.465 -41.8 4.959 5228
4 0 47 admin. married university.degree no yes no cellular nov mon 58 1 nonexistent -0.1 93.200 -42.0 4.191 5195
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 0 30 admin. married basic.6y no yes yes cellular jul thu 53 1 nonexistent 1.4 93.918 -42.7 4.958 5228
4115 0 39 admin. married high.school no yes no telephone jul fri 219 1 nonexistent 1.4 93.918 -42.7 4.959 5228
4116 0 27 student single high.school no no no cellular may mon 64 2 failure -1.8 92.893 -46.2 1.354 5099
4117 0 58 admin. married high.school no no no cellular aug fri 528 1 nonexistent 1.4 93.444 -36.1 4.966 5228
4118 0 34 management single high.school no yes no cellular nov wed 175 1 nonexistent -0.1 93.200 -42.0 4.120 5195

4119 rows × 19 columns

In [ ]:
#conversion onehot-encoding de las variables categoricas
categorico = data.select_dtypes(exclude=['int64', 'float'])
categorico.columns
Out[ ]:
Index(['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact',
       'month', 'day_of_week', 'poutcome'],
      dtype='object')
In [ ]:
data.columns
Out[ ]:
Index(['class', 'age', 'job', 'marital', 'education', 'default', 'housing',
       'loan', 'contact', 'month', 'day_of_week', 'duration', 'campaign',
       'poutcome', 'emp.var.rate', 'cons.price.idx', 'cons.conf.idx',
       'euribor3m', 'nr.employed'],
      dtype='object')
In [ ]:
data
Out[ ]:
class age job marital education default housing loan contact month day_of_week duration campaign poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 0 30 blue-collar married basic.9y no yes no cellular may fri 487 2 nonexistent -1.8 92.893 -46.2 1.313 5099
1 0 39 services single high.school no no no telephone may fri 346 4 nonexistent 1.1 93.994 -36.4 4.855 5191
2 0 25 services married high.school no yes no telephone jun wed 227 1 nonexistent 1.4 94.465 -41.8 4.962 5228
3 0 38 services married basic.9y no unknown unknown telephone jun fri 17 3 nonexistent 1.4 94.465 -41.8 4.959 5228
4 0 47 admin. married university.degree no yes no cellular nov mon 58 1 nonexistent -0.1 93.200 -42.0 4.191 5195
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 0 30 admin. married basic.6y no yes yes cellular jul thu 53 1 nonexistent 1.4 93.918 -42.7 4.958 5228
4115 0 39 admin. married high.school no yes no telephone jul fri 219 1 nonexistent 1.4 93.918 -42.7 4.959 5228
4116 0 27 student single high.school no no no cellular may mon 64 2 failure -1.8 92.893 -46.2 1.354 5099
4117 0 58 admin. married high.school no no no cellular aug fri 528 1 nonexistent 1.4 93.444 -36.1 4.966 5228
4118 0 34 management single high.school no yes no cellular nov wed 175 1 nonexistent -0.1 93.200 -42.0 4.120 5195

4119 rows × 19 columns

In [ ]:
# Realizar one-hot encoding utilizando get_dummies()
one_hot_encoded = pd.get_dummies(data[['job', 'marital', 'education', 'default', 'housing',
       'loan', 'contact', 'month', 'day_of_week',
       'poutcome']])
In [ ]:
# Concatenar los dataframes
df_encoded = pd.concat([data, one_hot_encoded], axis=1)

# Eliminar las variables categóricas originales
df_encoded = df_encoded.drop(['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week', 'poutcome'], axis=1)
In [ ]:
df_encoded
Out[ ]:
class age duration campaign emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed job_admin. job_blue-collar job_entrepreneur job_housemaid job_management job_retired job_self-employed job_services job_student job_technician job_unemployed job_unknown marital_divorced marital_married marital_single marital_unknown education_basic.4y education_basic.6y education_basic.9y education_high.school education_illiterate education_professional.course education_university.degree education_unknown default_no default_unknown default_yes housing_no housing_unknown housing_yes loan_no loan_unknown loan_yes contact_cellular contact_telephone month_apr month_aug month_dec month_jul month_jun month_mar month_may month_nov month_oct month_sep day_of_week_fri day_of_week_mon day_of_week_thu day_of_week_tue day_of_week_wed poutcome_failure poutcome_nonexistent poutcome_success
0 0 30 487 2 -1.8 92.893 -46.2 1.313 5099 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0
1 0 39 346 4 1.1 93.994 -36.4 4.855 5191 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0
2 0 25 227 1 1.4 94.465 -41.8 4.962 5228 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0
3 0 38 17 3 1.4 94.465 -41.8 4.959 5228 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0
4 0 47 58 1 -0.1 93.200 -42.0 4.191 5195 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4114 0 30 53 1 1.4 93.918 -42.7 4.958 5228 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0
4115 0 39 219 1 1.4 93.918 -42.7 4.959 5228 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4116 0 27 64 2 -1.8 92.893 -46.2 1.354 5099 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0
4117 0 58 528 1 1.4 93.444 -36.1 4.966 5228 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4118 0 34 175 1 -0.1 93.200 -42.0 4.120 5195 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0

4119 rows × 62 columns

In [ ]:
df_encoded.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4119 entries, 0 to 4118
Data columns (total 62 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   class                          4119 non-null   int64  
 1   age                            4119 non-null   int64  
 2   duration                       4119 non-null   int64  
 3   campaign                       4119 non-null   int64  
 4   emp.var.rate                   4119 non-null   float64
 5   cons.price.idx                 4119 non-null   float64
 6   cons.conf.idx                  4119 non-null   float64
 7   euribor3m                      4119 non-null   float64
 8   nr.employed                    4119 non-null   int64  
 9   job_admin.                     4119 non-null   uint8  
 10  job_blue-collar                4119 non-null   uint8  
 11  job_entrepreneur               4119 non-null   uint8  
 12  job_housemaid                  4119 non-null   uint8  
 13  job_management                 4119 non-null   uint8  
 14  job_retired                    4119 non-null   uint8  
 15  job_self-employed              4119 non-null   uint8  
 16  job_services                   4119 non-null   uint8  
 17  job_student                    4119 non-null   uint8  
 18  job_technician                 4119 non-null   uint8  
 19  job_unemployed                 4119 non-null   uint8  
 20  job_unknown                    4119 non-null   uint8  
 21  marital_divorced               4119 non-null   uint8  
 22  marital_married                4119 non-null   uint8  
 23  marital_single                 4119 non-null   uint8  
 24  marital_unknown                4119 non-null   uint8  
 25  education_basic.4y             4119 non-null   uint8  
 26  education_basic.6y             4119 non-null   uint8  
 27  education_basic.9y             4119 non-null   uint8  
 28  education_high.school          4119 non-null   uint8  
 29  education_illiterate           4119 non-null   uint8  
 30  education_professional.course  4119 non-null   uint8  
 31  education_university.degree    4119 non-null   uint8  
 32  education_unknown              4119 non-null   uint8  
 33  default_no                     4119 non-null   uint8  
 34  default_unknown                4119 non-null   uint8  
 35  default_yes                    4119 non-null   uint8  
 36  housing_no                     4119 non-null   uint8  
 37  housing_unknown                4119 non-null   uint8  
 38  housing_yes                    4119 non-null   uint8  
 39  loan_no                        4119 non-null   uint8  
 40  loan_unknown                   4119 non-null   uint8  
 41  loan_yes                       4119 non-null   uint8  
 42  contact_cellular               4119 non-null   uint8  
 43  contact_telephone              4119 non-null   uint8  
 44  month_apr                      4119 non-null   uint8  
 45  month_aug                      4119 non-null   uint8  
 46  month_dec                      4119 non-null   uint8  
 47  month_jul                      4119 non-null   uint8  
 48  month_jun                      4119 non-null   uint8  
 49  month_mar                      4119 non-null   uint8  
 50  month_may                      4119 non-null   uint8  
 51  month_nov                      4119 non-null   uint8  
 52  month_oct                      4119 non-null   uint8  
 53  month_sep                      4119 non-null   uint8  
 54  day_of_week_fri                4119 non-null   uint8  
 55  day_of_week_mon                4119 non-null   uint8  
 56  day_of_week_thu                4119 non-null   uint8  
 57  day_of_week_tue                4119 non-null   uint8  
 58  day_of_week_wed                4119 non-null   uint8  
 59  poutcome_failure               4119 non-null   uint8  
 60  poutcome_nonexistent           4119 non-null   uint8  
 61  poutcome_success               4119 non-null   uint8  
dtypes: float64(4), int64(5), uint8(53)
memory usage: 502.9 KB

Selección de variables¶

In [ ]:
correlaciones = round(df_encoded.corr(), 1)
correlaciones.style.background_gradient (cmap = 'coolwarm')
Out[ ]:
  class age duration campaign emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed job_admin. job_blue-collar job_entrepreneur job_housemaid job_management job_retired job_self-employed job_services job_student job_technician job_unemployed job_unknown marital_divorced marital_married marital_single marital_unknown education_basic.4y education_basic.6y education_basic.9y education_high.school education_illiterate education_professional.course education_university.degree education_unknown default_no default_unknown default_yes housing_no housing_unknown housing_yes loan_no loan_unknown loan_yes contact_cellular contact_telephone month_apr month_aug month_dec month_jul month_jun month_mar month_may month_nov month_oct month_sep day_of_week_fri day_of_week_mon day_of_week_thu day_of_week_tue day_of_week_wed poutcome_failure poutcome_nonexistent poutcome_success
class 1.000000 0.100000 0.400000 -0.100000 -0.300000 -0.100000 0.100000 -0.300000 -0.300000 0.000000 -0.100000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 0.100000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.100000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.100000 -0.100000 0.000000 -0.000000 0.100000 -0.000000 0.000000 0.200000 -0.100000 -0.000000 0.100000 0.100000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.200000 0.300000
age 0.100000 1.000000 0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.100000 -0.000000 0.000000 0.100000 0.100000 0.400000 0.000000 -0.100000 -0.200000 -0.100000 -0.000000 0.100000 0.200000 0.300000 -0.400000 0.000000 0.200000 0.000000 -0.000000 -0.100000 0.000000 0.000000 -0.100000 0.100000 -0.200000 0.200000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.100000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 0.000000 -0.000000 0.000000
duration 0.400000 0.000000 1.000000 -0.100000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000
campaign -0.100000 -0.000000 -0.100000 1.000000 0.200000 0.100000 0.000000 0.200000 0.200000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.100000 0.100000 -0.100000 0.000000 0.000000 0.100000 0.100000 -0.000000 -0.000000 -0.100000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.100000 0.100000 -0.100000
emp.var.rate -0.300000 -0.000000 -0.000000 0.200000 1.000000 0.800000 0.200000 1.000000 0.900000 -0.000000 0.100000 0.000000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 0.000000 0.100000 -0.100000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.200000 0.200000 -0.000000 0.100000 0.000000 -0.100000 -0.000000 0.000000 -0.000000 -0.400000 0.400000 -0.300000 0.200000 -0.100000 0.300000 0.100000 -0.100000 -0.100000 -0.100000 -0.200000 -0.200000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.400000 0.500000 -0.300000
cons.price.idx -0.100000 -0.000000 0.000000 0.100000 0.800000 1.000000 0.000000 0.700000 0.500000 -0.100000 0.100000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.100000 -0.000000 0.100000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.100000 0.000000 -0.200000 0.200000 -0.000000 0.100000 0.000000 -0.100000 0.000000 0.000000 -0.000000 -0.600000 0.600000 -0.200000 -0.200000 -0.100000 0.300000 0.400000 -0.100000 -0.100000 -0.200000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.300000 0.300000 -0.100000
cons.conf.idx 0.100000 0.100000 -0.000000 0.000000 0.200000 0.000000 1.000000 0.300000 0.100000 0.100000 -0.100000 -0.000000 0.000000 -0.000000 0.100000 0.000000 -0.100000 0.000000 0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.100000 -0.100000 -0.100000 0.000000 0.000000 0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.300000 0.300000 -0.300000 0.500000 0.100000 -0.200000 -0.100000 -0.100000 -0.000000 -0.100000 0.200000 0.200000 -0.000000 -0.000000 -0.000000 0.100000 0.000000 -0.200000 0.100000 0.100000
euribor3m -0.300000 -0.000000 -0.000000 0.200000 1.000000 0.700000 0.300000 1.000000 0.900000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 0.000000 0.100000 -0.100000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.200000 0.200000 0.000000 0.100000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.400000 0.400000 -0.300000 0.200000 -0.100000 0.300000 0.100000 -0.200000 -0.200000 0.000000 -0.200000 -0.200000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.400000 0.500000 -0.300000
nr.employed -0.300000 -0.000000 -0.000000 0.200000 0.900000 0.500000 0.100000 0.900000 1.000000 -0.000000 0.100000 0.000000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 0.000000 0.100000 -0.100000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.200000 0.200000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.300000 0.300000 -0.200000 0.200000 -0.100000 0.300000 0.200000 -0.200000 -0.200000 0.000000 -0.300000 -0.300000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.400000 0.500000 -0.400000
job_admin. 0.000000 -0.100000 0.000000 0.000000 -0.000000 -0.100000 0.100000 -0.000000 -0.000000 1.000000 -0.300000 -0.100000 -0.100000 -0.200000 -0.100000 -0.100000 -0.200000 -0.100000 -0.300000 -0.100000 -0.100000 0.000000 -0.100000 0.100000 0.000000 -0.200000 -0.100000 -0.200000 0.100000 -0.000000 -0.200000 0.300000 -0.000000 0.100000 -0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.100000 -0.100000 -0.000000 0.100000 -0.000000 -0.000000 -0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.100000
job_blue-collar -0.100000 -0.000000 0.000000 -0.000000 0.100000 0.100000 -0.100000 0.000000 0.100000 -0.300000 1.000000 -0.100000 -0.100000 -0.200000 -0.100000 -0.100000 -0.200000 -0.100000 -0.200000 -0.100000 -0.100000 -0.100000 0.100000 -0.100000 -0.000000 0.300000 0.300000 0.300000 -0.200000 -0.000000 -0.100000 -0.300000 0.000000 -0.200000 0.200000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.100000 0.100000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 0.100000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.100000
job_entrepreneur -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.100000 -0.100000 1.000000 -0.000000 -0.100000 -0.000000 -0.000000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 -0.000000 -0.000000 0.100000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000
job_housemaid -0.000000 0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.100000 -0.100000 -0.000000 1.000000 -0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 -0.100000 -0.000000 0.200000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.100000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000
job_management -0.000000 0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.200000 -0.200000 -0.100000 -0.000000 1.000000 -0.100000 -0.100000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 0.000000 0.100000 -0.100000 -0.000000 -0.100000 -0.000000 -0.100000 -0.100000 -0.000000 -0.100000 0.200000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000
job_retired 0.100000 0.400000 0.000000 -0.000000 -0.100000 -0.000000 0.100000 -0.100000 -0.100000 -0.100000 -0.100000 -0.000000 -0.000000 -0.100000 1.000000 -0.000000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 0.100000 0.100000 -0.100000 0.100000 0.200000 -0.000000 -0.000000 -0.000000 0.100000 0.000000 -0.100000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.100000 -0.000000 0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.100000
job_self-employed -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.100000 -0.100000 -0.000000 -0.000000 -0.100000 -0.000000 1.000000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.100000 -0.000000 -0.000000 0.100000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000
job_services -0.000000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 -0.000000 -0.000000 -0.200000 -0.200000 -0.100000 -0.100000 -0.100000 -0.100000 -0.100000 1.000000 -0.000000 -0.100000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.100000 -0.000000 0.000000 0.300000 -0.000000 -0.100000 -0.200000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000
job_student 0.100000 -0.200000 0.000000 -0.000000 -0.100000 -0.000000 0.000000 -0.100000 -0.100000 -0.100000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.100000 -0.000000 -0.000000 -0.000000 -0.200000 0.200000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.000000 0.100000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.100000 -0.100000 0.000000
job_technician 0.000000 -0.100000 -0.000000 -0.000000 0.000000 -0.000000 0.100000 0.000000 0.000000 -0.300000 -0.200000 -0.100000 -0.100000 -0.100000 -0.100000 -0.100000 -0.100000 -0.100000 1.000000 -0.100000 -0.000000 0.000000 -0.000000 0.100000 -0.000000 -0.100000 -0.100000 -0.100000 -0.100000 -0.000000 0.500000 -0.000000 -0.000000 0.100000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000
job_unemployed 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.100000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.100000 1.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.100000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000
job_unknown -0.000000 0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.100000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.100000 0.100000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000
marital_divorced -0.000000 0.200000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.100000 0.000000 0.000000 0.000000 0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 1.000000 -0.400000 -0.200000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000
marital_married -0.000000 0.300000 -0.000000 -0.000000 0.100000 0.000000 0.000000 0.100000 0.100000 -0.100000 0.100000 0.000000 0.000000 0.100000 0.100000 -0.000000 0.000000 -0.200000 -0.000000 0.000000 0.000000 -0.400000 1.000000 -0.800000 -0.100000 0.100000 0.100000 0.100000 -0.000000 -0.000000 0.000000 -0.100000 -0.000000 -0.100000 0.100000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.100000 0.100000 0.000000 0.000000 0.000000 -0.100000 0.000000 -0.100000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000
marital_single 0.000000 -0.400000 -0.000000 0.000000 -0.100000 -0.100000 -0.000000 -0.100000 -0.100000 0.100000 -0.100000 -0.100000 -0.100000 -0.100000 -0.100000 0.000000 -0.000000 0.200000 0.100000 -0.000000 -0.000000 -0.200000 -0.800000 1.000000 -0.000000 -0.100000 -0.100000 -0.000000 0.000000 -0.000000 -0.000000 0.100000 0.000000 0.100000 -0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.100000 -0.100000 -0.000000 0.000000 -0.000000 0.100000 -0.000000 0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000
marital_unknown -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.100000 -0.000000 1.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000
education_basic.4y -0.000000 0.200000 -0.000000 -0.000000 0.000000 0.100000 0.000000 0.000000 0.000000 -0.200000 0.300000 0.000000 0.200000 -0.100000 0.200000 -0.000000 -0.100000 -0.000000 -0.100000 0.000000 0.000000 0.000000 0.100000 -0.100000 0.000000 1.000000 -0.100000 -0.100000 -0.200000 -0.000000 -0.100000 -0.200000 -0.100000 -0.200000 0.200000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000
education_basic.6y -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.100000 0.000000 0.000000 -0.100000 0.300000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.000000 -0.000000 0.100000 -0.100000 -0.000000 -0.100000 1.000000 -0.100000 -0.100000 -0.000000 -0.100000 -0.200000 -0.000000 -0.100000 0.100000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.100000 0.100000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000
education_basic.9y -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.100000 0.000000 0.000000 -0.200000 0.300000 0.000000 -0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 -0.100000 0.000000 0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.100000 -0.100000 1.000000 -0.200000 -0.000000 -0.200000 -0.300000 -0.100000 -0.100000 0.100000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.100000 0.100000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000
education_high.school -0.000000 -0.100000 0.000000 0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.000000 0.100000 -0.200000 -0.100000 -0.000000 -0.100000 -0.000000 -0.100000 0.300000 0.100000 -0.100000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.200000 -0.100000 -0.200000 1.000000 -0.000000 -0.200000 -0.400000 -0.100000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000
education_illiterate -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000
education_professional.course 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.200000 -0.100000 -0.000000 -0.000000 -0.100000 0.000000 -0.000000 -0.100000 -0.000000 0.500000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.100000 -0.100000 -0.200000 -0.200000 -0.000000 1.000000 -0.300000 -0.100000 0.100000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000
education_university.degree 0.000000 -0.100000 -0.000000 0.000000 -0.000000 -0.100000 0.100000 -0.000000 -0.000000 0.300000 -0.300000 0.100000 -0.000000 0.200000 -0.100000 0.100000 -0.200000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.100000 0.100000 0.000000 -0.200000 -0.200000 -0.300000 -0.400000 -0.000000 -0.300000 1.000000 -0.100000 0.100000 -0.100000 -0.000000 -0.000000 -0.100000 0.000000 0.000000 -0.100000 0.000000 0.100000 -0.100000 -0.000000 0.200000 0.000000 -0.100000 -0.000000 0.000000 -0.100000 0.100000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000
education_unknown 0.000000 0.100000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 0.000000 -0.000000 -0.100000 -0.000000 -0.100000 -0.100000 -0.000000 -0.100000 -0.100000 1.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000
default_no 0.100000 -0.200000 0.000000 -0.000000 -0.200000 -0.200000 -0.000000 -0.200000 -0.200000 0.100000 -0.200000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.100000 0.000000 -0.100000 0.000000 -0.100000 0.100000 0.000000 -0.200000 -0.100000 -0.100000 0.000000 0.000000 0.100000 0.100000 -0.000000 1.000000 -1.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.200000 -0.200000 0.000000 0.000000 0.000000 -0.000000 -0.100000 0.000000 -0.100000 0.100000 0.000000 0.100000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.100000 -0.100000 0.100000
default_unknown -0.100000 0.200000 -0.000000 0.000000 0.200000 0.200000 0.000000 0.200000 0.200000 -0.100000 0.200000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.100000 -0.000000 0.100000 -0.000000 0.100000 -0.100000 -0.000000 0.200000 0.100000 0.100000 -0.000000 -0.000000 -0.100000 -0.100000 0.000000 -1.000000 1.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.200000 0.200000 -0.000000 -0.000000 -0.000000 0.000000 0.100000 -0.000000 0.100000 -0.100000 -0.000000 -0.100000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.100000 0.100000 -0.100000
default_yes -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000
housing_no 0.000000 0.000000 0.000000 0.000000 0.100000 0.100000 0.000000 0.100000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 1.000000 -0.100000 -0.900000 0.100000 -0.100000 -0.100000 -0.100000 0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000
housing_unknown -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.100000 -0.000000 -0.000000 0.000000 -0.000000 -0.100000 1.000000 -0.200000 -0.300000 1.000000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000
housing_yes 0.000000 -0.000000 -0.000000 -0.000000 -0.100000 -0.100000 -0.000000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.900000 -0.200000 1.000000 -0.000000 -0.200000 0.100000 0.100000 -0.100000 0.000000 0.000000 0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000
loan_no 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.100000 -0.300000 -0.000000 1.000000 -0.300000 -0.900000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000
loan_unknown -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.100000 -0.000000 -0.000000 0.000000 -0.000000 -0.100000 1.000000 -0.200000 -0.300000 1.000000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000
loan_yes -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.100000 -0.100000 0.100000 -0.900000 -0.100000 1.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000
contact_cellular 0.100000 -0.000000 0.000000 -0.100000 -0.400000 -0.600000 -0.300000 -0.400000 -0.300000 0.100000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.100000 0.100000 -0.000000 -0.000000 -0.100000 -0.100000 0.000000 0.000000 0.000000 0.100000 -0.000000 0.200000 -0.200000 0.000000 -0.100000 -0.000000 0.100000 0.000000 -0.000000 -0.000000 1.000000 -1.000000 0.100000 0.300000 0.000000 0.200000 -0.400000 0.100000 -0.300000 0.200000 0.000000 0.000000 -0.000000 -0.000000 0.100000 -0.000000 -0.000000 0.200000 -0.300000 0.100000
contact_telephone -0.100000 0.000000 -0.000000 0.100000 0.400000 0.600000 0.300000 0.400000 0.300000 -0.100000 0.100000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.100000 -0.100000 0.000000 0.000000 0.100000 0.100000 -0.000000 -0.000000 -0.000000 -0.100000 0.000000 -0.200000 0.200000 -0.000000 0.100000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -1.000000 1.000000 -0.100000 -0.300000 -0.000000 -0.200000 0.400000 -0.100000 0.300000 -0.200000 -0.000000 -0.000000 0.000000 0.000000 -0.100000 0.000000 0.000000 -0.200000 0.300000 -0.100000
month_apr 0.000000 0.000000 0.000000 -0.100000 -0.300000 -0.200000 -0.300000 -0.300000 -0.200000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.100000 -0.100000 1.000000 -0.100000 -0.000000 -0.100000 -0.100000 -0.000000 -0.200000 -0.100000 -0.000000 -0.000000 0.100000 0.000000 0.000000 -0.100000 -0.100000 0.100000 -0.100000 -0.000000
month_aug -0.000000 0.100000 -0.100000 0.000000 0.200000 -0.200000 0.500000 0.200000 0.200000 0.100000 -0.100000 -0.000000 0.100000 0.000000 0.000000 -0.000000 -0.100000 0.000000 0.100000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.100000 -0.100000 -0.100000 0.000000 0.000000 0.200000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.300000 -0.300000 -0.100000 1.000000 -0.000000 -0.200000 -0.200000 -0.000000 -0.300000 -0.100000 -0.100000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.100000 0.100000 0.000000
month_dec 0.100000 0.000000 0.000000 0.000000 -0.100000 -0.100000 0.100000 -0.100000 -0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.100000 0.100000
month_jul -0.000000 -0.000000 0.000000 0.100000 0.300000 0.300000 -0.200000 0.300000 0.300000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.100000 0.100000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.200000 -0.200000 -0.100000 -0.200000 -0.000000 1.000000 -0.200000 -0.000000 -0.300000 -0.200000 -0.100000 -0.100000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 0.100000 -0.000000
month_jun 0.000000 -0.000000 -0.000000 0.100000 0.100000 0.400000 -0.100000 0.100000 0.200000 -0.100000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.100000 0.100000 -0.000000 0.000000 0.000000 -0.100000 -0.000000 0.000000 -0.000000 -0.400000 0.400000 -0.100000 -0.200000 -0.000000 -0.200000 1.000000 -0.000000 -0.300000 -0.100000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.100000 0.100000 -0.000000
month_mar 0.200000 0.000000 0.000000 -0.000000 -0.100000 -0.100000 -0.100000 -0.200000 -0.200000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.100000 0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.100000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.100000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 0.100000
month_may -0.100000 -0.000000 0.000000 -0.000000 -0.100000 -0.100000 -0.000000 -0.200000 -0.200000 -0.000000 0.100000 0.000000 -0.000000 -0.000000 -0.100000 0.000000 0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.100000 0.100000 0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.100000 0.100000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.300000 0.300000 -0.200000 -0.300000 -0.100000 -0.300000 -0.300000 -0.100000 1.000000 -0.200000 -0.100000 -0.100000 0.000000 -0.000000 -0.100000 0.000000 0.100000 0.100000 -0.000000 -0.100000
month_nov -0.000000 0.000000 0.000000 -0.100000 -0.100000 -0.200000 -0.100000 0.000000 0.000000 -0.000000 -0.100000 0.100000 0.000000 0.100000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.000000 0.100000 -0.100000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.200000 -0.200000 -0.100000 -0.100000 -0.000000 -0.200000 -0.100000 -0.000000 -0.200000 1.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.100000 -0.100000 0.000000
month_oct 0.100000 0.100000 0.000000 -0.100000 -0.200000 -0.100000 0.200000 -0.200000 -0.300000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.100000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.100000 -0.100000 -0.000000 -0.100000 -0.000000 1.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.100000 -0.100000 0.000000
month_sep 0.100000 0.000000 0.000000 -0.000000 -0.200000 -0.000000 0.200000 -0.200000 -0.300000 0.000000 -0.100000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.100000 -0.100000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.100000 -0.000000 -0.100000 -0.000000 -0.000000 -0.100000 -0.000000 -0.000000 1.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.100000 -0.100000 0.100000
day_of_week_fri -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.100000 -0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 1.000000 -0.200000 -0.200000 -0.200000 -0.200000 0.000000 -0.000000 -0.000000
day_of_week_mon 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.200000 1.000000 -0.300000 -0.300000 -0.300000 -0.000000 0.000000 0.000000
day_of_week_thu 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 0.100000 -0.100000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.100000 0.000000 -0.000000 0.000000 -0.200000 -0.300000 1.000000 -0.300000 -0.300000 -0.000000 -0.000000 0.000000
day_of_week_tue -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.100000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.200000 -0.300000 -0.300000 1.000000 -0.200000 -0.000000 0.000000 0.000000
day_of_week_wed -0.000000 -0.100000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.100000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.100000 0.000000 -0.000000 -0.000000 -0.200000 -0.300000 -0.300000 -0.200000 1.000000 0.000000 0.000000 -0.000000
poutcome_failure 0.000000 0.000000 -0.000000 -0.100000 -0.400000 -0.300000 -0.200000 -0.400000 -0.400000 -0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 -0.000000 0.000000 0.100000 -0.100000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.200000 -0.200000 0.100000 -0.100000 0.000000 -0.100000 -0.100000 0.000000 0.100000 0.100000 0.100000 0.100000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 1.000000 -0.900000 -0.100000
poutcome_nonexistent -0.200000 -0.000000 -0.000000 0.100000 0.500000 0.300000 0.100000 0.500000 0.500000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.100000 0.000000 -0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.100000 0.100000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.300000 0.300000 -0.100000 0.100000 -0.100000 0.100000 0.100000 -0.100000 -0.000000 -0.100000 -0.100000 -0.100000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.900000 1.000000 -0.500000
poutcome_success 0.300000 0.000000 0.000000 -0.100000 -0.300000 -0.100000 0.100000 -0.300000 -0.400000 0.100000 -0.100000 -0.000000 -0.000000 0.000000 0.100000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 0.000000 -0.000000 0.000000 0.000000 0.000000 0.100000 -0.100000 -0.000000 -0.000000 -0.000000 0.000000 0.000000 -0.000000 -0.000000 0.100000 -0.100000 -0.000000 0.000000 0.100000 -0.000000 -0.000000 0.100000 -0.100000 0.000000 0.000000 0.100000 -0.000000 0.000000 0.000000 0.000000 -0.000000 -0.100000 -0.500000 1.000000
In [ ]:
correlaciones['class_vs_variables'] = correlaciones['class']
correlaciones[['class_vs_variables']].sort_values(by = 'class_vs_variables', ascending = False).style.background_gradient (cmap = 'coolwarm')
Out[ ]:
  class_vs_variables
class 1.000000
duration 0.400000
poutcome_success 0.300000
month_mar 0.200000
job_student 0.100000
month_sep 0.100000
contact_cellular 0.100000
month_dec 0.100000
default_no 0.100000
job_retired 0.100000
month_oct 0.100000
age 0.100000
cons.conf.idx 0.100000
day_of_week_fri -0.000000
poutcome_failure 0.000000
default_yes -0.000000
housing_no 0.000000
housing_unknown -0.000000
housing_yes 0.000000
loan_no 0.000000
loan_unknown -0.000000
loan_yes -0.000000
month_apr 0.000000
day_of_week_mon 0.000000
month_aug -0.000000
month_jul -0.000000
month_jun 0.000000
day_of_week_wed -0.000000
day_of_week_tue -0.000000
month_nov -0.000000
day_of_week_thu 0.000000
education_unknown 0.000000
education_university.degree 0.000000
job_unemployed 0.000000
job_entrepreneur -0.000000
job_management -0.000000
job_self-employed -0.000000
job_services -0.000000
job_technician 0.000000
job_admin. 0.000000
job_unknown -0.000000
marital_divorced -0.000000
marital_married -0.000000
job_housemaid -0.000000
marital_single 0.000000
marital_unknown -0.000000
education_basic.4y -0.000000
education_basic.6y -0.000000
education_basic.9y -0.000000
education_high.school -0.000000
education_illiterate -0.000000
education_professional.course 0.000000
job_blue-collar -0.100000
cons.price.idx -0.100000
campaign -0.100000
month_may -0.100000
default_unknown -0.100000
contact_telephone -0.100000
poutcome_nonexistent -0.200000
nr.employed -0.300000
euribor3m -0.300000
emp.var.rate -0.300000
In [ ]:
threshold = 0.7
In [ ]:
# Encontrar las variables altamente correlacionadas
upper_triangle = correlaciones.where(np.triu(np.ones(correlaciones.shape), k=1).astype(np.bool))
highly_correlated_vars = [column for column in upper_triangle.columns if any(upper_triangle[column] > threshold)]

# Eliminar las variables altamente correlacionadas
df_encoded.drop(columns=highly_correlated_vars, inplace=True)
In [ ]:
# Calcular la correlación con la variable "class"
class_correlation = df_encoded.corr().abs()['class']

# Definir el umbral de correlación mínima
threshold = 0.1

# Obtener las variables con correlación menor al umbral
low_correlation_vars = class_correlation[class_correlation < threshold].index.tolist()

# Eliminar las variables con menor correlación
df_encoded.drop(columns=low_correlation_vars, inplace=True)
In [ ]:
df_encoded
Out[ ]:
class duration emp.var.rate contact_cellular contact_telephone month_dec month_mar month_may month_oct month_sep poutcome_nonexistent poutcome_success
0 0 487 -1.8 1 0 0 0 1 0 0 1 0
1 0 346 1.1 0 1 0 0 1 0 0 1 0
2 0 227 1.4 0 1 0 0 0 0 0 1 0
3 0 17 1.4 0 1 0 0 0 0 0 1 0
4 0 58 -0.1 1 0 0 0 0 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ...
4114 0 53 1.4 1 0 0 0 0 0 0 1 0
4115 0 219 1.4 0 1 0 0 0 0 0 1 0
4116 0 64 -1.8 1 0 0 0 1 0 0 0 0
4117 0 528 1.4 1 0 0 0 0 0 0 1 0
4118 0 175 -0.1 1 0 0 0 0 0 0 1 0

4119 rows × 12 columns

Balanceo del Dataframe¶

In [ ]:
# plot of original data
sns.countplot(x = df_encoded['class'])
Out[ ]:
<Axes: xlabel='class', ylabel='count'>

Separamos el dataframe en set de entrenamiento y set de prueba¶

In [ ]:
# separate dataset into train and test

X_train, X_test, y_train, y_test = train_test_split(
    df_encoded.drop(labels=['class'], axis=1),  # drop the target
    df_encoded['class'],  # just the target
    test_size=0.3,
    random_state=0)

X_train.shape, X_test.shape
Out[ ]:
((2883, 11), (1236, 11))

Balanceamos el set de entrenamiento y set de prueba¶

In [ ]:
# set up the random undersampling class
rus = RandomUnderSampler(
    sampling_strategy='auto',  # samples only from majority class
    random_state=0,  # for reproducibility
    replacement=True # if it should resample with replacement
)  

X_resampled, y_resampled = rus.fit_resample(X_train, y_train)
In [ ]:
# size of undersampled data

X_resampled.shape, y_resampled.shape
Out[ ]:
((658, 11), (658,))
In [ ]:
 
In [ ]:
# plot of original data
sns.countplot(x = y_resampled)
Out[ ]:
<Axes: xlabel='class', ylabel='count'>

Modelamiento de los datos (Machine Learning)¶

  • Logistic Regression¶

In [ ]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression

logreg_params = {
    "penalty": ['l2'],
    "C": [0.001, 0.01, 0.1, 1, 10, 100]
}

gcv_logreg = GridSearchCV(LogisticRegression(solver='lbfgs', random_state=1),
                          param_grid=logreg_params,
                          cv=10,
                          scoring='f1')

logreg_gcv = gcv_logreg.fit(X_resampled, y_resampled)
In [ ]:
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, roc_auc_score

logreg_pred = logreg_gcv.best_estimator_.predict(X_test)

print(classification_report(y_test, logreg_pred))
print('Accuracy Score: ',accuracy_score(y_test,logreg_pred))
print(f'F1 Score: {f1_score(y_test,logreg_pred)}')
              precision    recall  f1-score   support

           0       0.97      0.85      0.91      1114
           1       0.36      0.80      0.50       122

    accuracy                           0.84      1236
   macro avg       0.67      0.82      0.70      1236
weighted avg       0.91      0.84      0.86      1236

Accuracy Score:  0.8406148867313916
F1 Score: 0.4961636828644501
  • Naive Bayes¶

In [ ]:
from sklearn.naive_bayes import GaussianNB

nb_params = {"var_smoothing": np.logspace(0, -9, num = 100)}

gcv_nb = GridSearchCV(GaussianNB(), 
                   param_grid = nb_params, 
                   cv = 10, 
                   scoring = 'f1')


nb_gcv = gcv_nb.fit(X_resampled, y_resampled)
In [ ]:
nb_pred = nb_gcv.best_estimator_.predict(X_test)

print(classification_report(y_test, nb_pred))
print('Accuracy Score: ',accuracy_score(y_test,nb_pred))
print(f'F1 Score: {f1_score(y_test,nb_pred)}\n')
              precision    recall  f1-score   support

           0       0.96      0.89      0.92      1114
           1       0.38      0.63      0.47       122

    accuracy                           0.86      1236
   macro avg       0.67      0.76      0.70      1236
weighted avg       0.90      0.86      0.88      1236

Accuracy Score:  0.86084142394822
F1 Score: 0.47239263803680975

  • Decision Tree¶

In [ ]:
from sklearn.tree import DecisionTreeClassifier

dt_params = {"criterion": ['gini', 'entropy'],
            "max_depth": np.arange(3, 15)}

gcv_dt = GridSearchCV(DecisionTreeClassifier(random_state=1), 
                   param_grid = dt_params, 
                   cv = 10, 
                   scoring = 'f1')


dt_gcv = gcv_dt.fit(X_resampled, y_resampled)
In [ ]:
dt_pred = dt_gcv.best_estimator_.predict(X_test)
print(classification_report(y_test, dt_pred))
print('Accuracy Score: ',accuracy_score(y_test,dt_pred))
print(f'F1 Score: {f1_score(y_test,dt_pred)}\n')
              precision    recall  f1-score   support

           0       0.97      0.81      0.88      1114
           1       0.31      0.80      0.44       122

    accuracy                           0.80      1236
   macro avg       0.64      0.80      0.66      1236
weighted avg       0.91      0.80      0.84      1236

Accuracy Score:  0.8042071197411004
F1 Score: 0.44495412844036697

  • Matriz de Confusión¶

In [ ]:
import matplotlib

colors = ['lightgray', Colors.Teal, Colors.Teal, Colors.Teal, Colors.Teal, Colors.Teal, Colors.Teal, Colors.Teal]
colormap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)

fig = plt.figure(figsize=(10, 14), facecolor=Colors.LightGray)  # create figure
gs = fig.add_gridspec(4, 2, wspace=0.1, hspace=0.8)

for ax in range(3):
    locals()['ax' + str(ax)] = fig.add_subplot(gs[ax, :])
    locals()['ax' + str(ax)].set_facecolor(Colors.LightGray)
    locals()['ax' + str(ax)].tick_params(axis=u'both', which=u'both', length=0)

# decision tree
dt_cm = confusion_matrix(y_test, dt_pred)
sns.heatmap(dt_cm, cmap=colormap, annot=True, fmt="d", linewidths=5, cbar=False, ax=ax0,
            yticklabels=['Actual Non-Subscribed', 'Actual Subscribed'],
            xticklabels=['Predicted Non-Subscribed', 'Predicted Subscribed'], annot_kws={"fontsize": 12})

# naive bayes
nb_cm = confusion_matrix(y_test, nb_pred)
sns.heatmap(nb_cm, cmap=colormap, annot=True, fmt="d", linewidths=5, cbar=False, ax=ax1,
            yticklabels=['Actual Non-Subscribed', 'Actual Subscribed'],
            xticklabels=['Predicted Non-Subscribed', 'Predicted Subscribed'], annot_kws={"fontsize": 12})

# logistic regression
logreg_cm = confusion_matrix(y_test, logreg_pred)
sns.heatmap(logreg_cm, cmap=colormap, annot=True, fmt="d", linewidths=5, cbar=False, ax=ax2,
            yticklabels=['Actual Non-Subscribed', 'Actual Subscribed'],
            xticklabels=['Predicted Non-Subscribed', 'Predicted Subscribed'], annot_kws={"fontsize": 12})

# test
ax0.text(0, -0.75, 'Decision Tree Performance', fontsize=18, fontweight='bold', fontfamily='serif')
ax0.text(0, -0.2, 'El modelo tiene la mayor precisión y predice bien los no suscritos. \nEl recall es bueno.',
         fontsize=14, fontfamily='serif')

ax1.text(0, -0.75, 'Naive Bayes Performance', fontsize=18, fontweight='bold', fontfamily='serif')
ax1.text(0, -0.2, 'Este modelo es el mayor en proporción de predicciones correctas sobre el total de predicciones.\nSin embargo, da gran cantidad de falsos negativos ',
         fontsize=14, fontfamily='serif')

ax2.text(0, -0.75, 'Logistic Regression Performance', fontsize=18, fontweight='bold', fontfamily='serif')
ax2.text(0, -0.2, 'El rendimiento es muy similar al del árbol de decisión. \nEl recall es bueno.',
         fontsize=14, fontfamily='serif')

plt.show()
  • Comparación de modelos¶

In [ ]:
# Make dataframes to plot
def dataframe_to_plot(title, classification_cm) -> pd.DataFrame:
    tn, fp, fn, tp = classification_cm.ravel()
    
    accuracy = (tp+tn) / (tn + tp + fn + fp)
    sensitivity = tp / (fn + tp) 
    precision = tp / (tp + fp) 
    f1 = (2 * (1/((1/precision) + (1/sensitivity))))
    
    foo = pd.DataFrame(data=[f1, accuracy, sensitivity, precision], 
             columns=[title],
             index=["F1","Accuracy", "Recall", "Precision"])
    
    return foo


logreg_df = dataframe_to_plot('Tuned Logistic Regression Score', logreg_cm)
navbayes_df = dataframe_to_plot('Tuned Naive Bayes Score', nb_cm)
dectree_df = dataframe_to_plot('Tuned Decision Tree Score', dt_cm)
In [ ]:
df_models = round(pd.concat([logreg_df, navbayes_df, dectree_df], axis=1),3)
colors = [Colors.LightGray, Colors.LightCyan, Colors.Teal]
colormap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)


fig = plt.figure(figsize=(10,8),dpi=100, facecolor=Colors.LightGray) # create figure
gs = fig.add_gridspec(3, 2, wspace=0.1, hspace=0.5)

ax0 = fig.add_subplot(gs[0:1, :])

sns.heatmap(df_models.T, cmap=colormap, annot=True,fmt=".1%",vmin=0,vmax=0.95, linewidths=2.5,cbar=False,ax=ax0,annot_kws={"fontsize":12})
fig.patch.set_facecolor(Colors.LightGray) 
ax0.set_facecolor(Colors.LightGray)
ax0.tick_params(axis=u'both', which=u'both',length=0)

ax0.text(0,-0.5,'Model Comparison',fontsize=20,fontweight='bold',fontfamily='serif')

plt.show()

Conclusiones¶

  • Regresión Logística:¶

  • La precisión para la clase 0 es alta (97%), lo que indica que es bueno para predecir correctamente instancias negativas.
  • El recall para la clase 1 es alto (80%), lo que significa que puede identificar la mayoría de las instancias positivas.
  • El puntaje F1 para la clase 1 es bajo (50%), lo que indica que hay un desequilibrio entre la precisión y el recall para la clase 1.
  • En general, el modelo tiene una precisión y puntaje F1 razonables, pero el recall para la clase 0 es bajo (85%).
  • Naive Bayes:¶

  • La precisión para la clase 0 es alta (96%), lo que indica una buena capacidad para predecir instancias negativas correctamente.
  • El recall para la clase 1 es moderado (63%), lo que indica que puede identificar la mayoría de las instancias positivas.
  • El puntaje F1 para la clase 1 es bajo (47%), similar al modelo de Regresión Logística.
  • En general, el modelo tiene una precisión razonable, pero el puntaje F1 y el recall para la clase 1 son bajos.
  • Árbol de decisión:¶

  • La precisión para la clase 0 es alta (97%), lo que indica una buena capacidad para predecir instancias negativas correctamente, similar a los modelos anteriores.
  • El recall para la clase 1 es alto (80%), lo que significa que puede identificar la mayoría de las instancias positivas, similar al - modelo de Regresión Logística.
  • El puntaje F1 para la clase 1 es bajo (44%), el más bajo de los tres modelos.
  • En general, el modelo tiene una precisión y recall razonables, pero el puntaje F1 para la clase 1 es el más bajo.
  • Conclusión final:¶

Basado en las métricas y considerando el equilibrio entre precisión, recall y puntaje F1, el modelo de Regresión Logística muestra un rendimiento general mejor en comparación con los otros dos modelos.